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Abstract 

An insightful view into the design of traitor tracing codes should necessarily consider the worst case attacks that the 
' colluders can lead. This paper takes an information-theoretic point of view where the worst case attack is defined as the 
collusion strategy minimizing the achievable rate of the traitor tracing code. Two different decoders are envisaged, the joint 
decoder and the simple decoder, as recently defined by P. Moulin [1], Several classes of colluders are defined with increasing 
■ power. The worst case attack is derived for each class and each decoder when applied to Tardos' codes and a probabilistic 
version of the Boneh-Shaw construction. This contextual study gives the real rates achievable by the binary probabilistic traitor 
tracing codes. Attacks usually considered in literature, such as majority or minority votes, are indeed largely suboptimal. This 
article also shows the utmost importance of the time-sharing concept in probabilistic codes. 
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I. Introduction 



This article deals with traitor tracing which is also known as active fingerprinting, content serialization, user forensics or 
transactional watermarking. A typical application is, for instance, as follows: A video on demand server distributes personal 
copies of the same content to n buyers. Some are dishonest users whose goal is to illegally redistribute a pirate copy. The 
rights holder is interested in identifying these dishonest users. For this purpose, a unique user identifier consisting on a 
Q ', sequence of m symbols is embedded in each video content thanks to a watermarking technique, thus producing n different 
l— ~~ '■ (although perceptually similar) copies. This allows tracing back which user has illegally redistributed his copy. However, 
\ there might be a collusion of c dishonest users, c > 1. This collusion mixes their copies in order to forge a pirated content 
J> ■ which contains none of the identifiers but a mixture of them. 

The traitor tracing code invented by Gabor Tardos in 2003 [2] becomes more and more popular. This code is a probabilistic 
00 . weak traitor tracing code, where the probability of accusing an innocent is not null. Its performance is usually evaluated 
in terms of the probability Pfa of accusing an innocent and the probability of missing all colluders Pfn- Most of the 
articles dealing with the Tardos code aim at finding a tighter lower bound on the length of the code. In his seminal work, 
£Q \ G. Tardos shows that, in order to guarantee Pfa < £ i an d Pfn < e i^ 4 as defined in the Boneh & Shaw problem (3), the 

■ code length must satisfy m > 100c 2 \ogn/e\. Many researchers found the constant 100 too arbitrary. Better approximation^ 
\ are 47r 2 ||4], 85 J3), and 16 [6). A main improvement came from the symmetric decoding (7). Other works propose more 

practical implementations of the Tardos code [8 |. The reader will also find a pedagogical presentation of this code in (9). 
Our article is very different than these past threads of studies as we give the theoretical performances of the code whatever 

■ the accusation algorithm. In a nutshell, our work consists in applying the results of JTJ. In this article, P. Moulin gives the 
^ \ definition of capacity for the traitor tracing problem, providing exact capacity expressions for the blind model, i.e. when the 

decoder does not know in advance neither the number of colluders nor the particular collusion strategy followed by them. 
So far, only bounds to the capacity had been derived by other authors (see references in [1]). In words, capacity is defined 
as the maximum (over all traitor ttacing codes) of the minimum (over all strategies allowed by the collusion model) of 
an appropriate mutual information functional. Nevertheless, the problems of finding the best traitor tracing codes and the 
optimal collusion attacks are left open, although some important hints are given in [ 1| and more recently in iflOl . Our results 
are not in the direction of solving this game-theoretic problem. We consider specific binary fingerprinting codes and seek 
for the collusion strategy minimizing the mutual information. Therefore, we cannot speak of capacity of a given collusion 
channel as in JT|, but of the maximum achievable rate of a given binary code. Our results are mainly aimed at providing 
more insight into the binary Tardos code, but the methodology can be easily extended, in general, to other code constructions 
based on the same principles. In fact, as explained in the sequel, our study also deals with a probabilistic version of the 
Boneh & Shaw code (3j. 

The goal of our study is twofold. First, it seems that an invariance property governs the design of Tardos code the 
Markov lower bounds on code length derived in (2), |0], G) involve means and variances of the innocents and colluders 
scores which are invariant with respect to the collusion strategy. Therefore, this nuisance parameter unknown at the accusation 
side is no longer a problem since the bounds hold whatever its value. A priori, there is no collusion attack which is worse 



Numbers are given for the non symmetric decoding, where symbols '0' are discarded as in the original Tardos setup. 
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than any others. This is yet only true as far as the first and second order statistics of the scores are concerned. Higher order 
statistics do not share this invariance property. Furthermore, this invariance property only holds for the decoder originally 
suggested by G. Tardos. On contrary, the achievable rate of the traitor tracing code is an appropriate measure to quantify 
how damageable is a given collusion process whatever the decoding algorithm. Therefore, we are looking for the worst case 
collusion attack for a given number of colluders minimizing this quantity. The code is deemed sound whenever its rate is 
below this minimal mutual information, hence the term maximum achievable rate. These results clearly show that classical 
assessment against, for instance, majority or minority attacks can largely overestimate the performance of the code because 
these are far from being the worst collusion processes. 

The second goal of this article is to show the importance of time-sharing, which has been already highlighted in the 
theoretical derivations of P. Moulin [T), in practice when a binary Tardos code is considered. Time-sharing is a concept 
well known in multiuser information theory ifTTI . by which using two or more codes of different rates a new code can be 
constructed by using each code in disjoint fractions of the time. In the Tardos code, the probability of having a symbol T 
in a code sequence changes from one index i to another according to a given auxiliary random variable P, which is indeed 
the "time-sharing" variable that selects the code to be used in each index. Therefore, the achievable rate of the codes studied 
in this paper is defined as an expectation of a function over the time-sharing random variable P. It is very interesting to plot 
this later function with respect to P. Some attacks succeed in canceling this function over a range. Therefore, the support 
and the values taken by the probability density function f(P) of the time-sharing variable is of utmost importance. An 
appropriate time-sharing leads to huge improvements, provided the time-sharing sequence remains secret for the colluders. 
Moreover, this study also shows that even when this sequence is disclosed, performing traitor tracing is still possible in 
theory as the rate never exactly cancels. An interesting byproduct of our analysis is that it indeed addresses the analysis 
of binary traitor tracing codes without time-sharing, which has not been addressed before from the information-theoretic 
viewpoint, specially in the case of the simple decoder. 

We recently discovered that E. Amiri and G. Tardos iflOl . and independently Huang and Moulin |12|, addressed the same 
issues. However, relatively few of their results cover exactly our propositions: 

• For the joint decoder, they succeeded to derive in a game-theoretic setting the capacity-achieving parameter f(P). 
This is indeed a probability mass function (pmf - i.e. the time-sharing variable is discrete) strongly dependent on the 
collusion size. However, in a real case scenario one cannot foresee the exact number of colluders: at most, a maximum 
collusion size can be anticipated. The code is guaranteed to perform well whenever the number of colluders is below 
the predicted maximum; however, for bigger collusion sizes the code becomes unreliable. Furthermore, the numerical 
computation of the optimal f(P) seems not feasible for a large number of colluders fl2l . This motivates the interest in 
the study of the probabilistic traitor tracing code with a fixed continuous f(P) which, albeit suboptimal, does reasonably 
well for any collusion size. Remarkably, the f(P) proposed by Tardos which we study in this paper seems to be very 
close to the optimal f(P) when the number of colluders is very large, according to fl2l . and the asymptotic loss with 
respect to the capacity is only within a small factor. 

• For the simple decoder, Amiri and Tardos [ 1 1 considered a scenario where all colluder identities were disclosed except 
one, and the decoder is looking for the identity of this unknown colluder. Our simple decoder is the one defined by P. 
Moulin [1| which is very different and more realistic: no colluder is caught, and the goal of the decoder is to make a 
first accusation. 

Sec. HI] introduces all the mathematical definitions and assumptions needed to derive the worst case attacks: the type of 
traitor tracing code we are dealing with (Sec. III-Bll . the introduction of four different classes of colluders referred to as A, 
B, C and D (Sec. IH-C1 >. and two possible accusation strategies based on the so-called joint and simple decoders (Sec. III-Db . 
This paper gives the worst case attacks that a given class of colluders can lead against a given family of decoders: joint 
decoder in Sec. [In] and simple decoder in Sec. |IV] 

II. Model 

A. Notation 

First of all we summarize the most important notational conventions to be used throughout the paper. Random variables 
and their realizations are denoted by capital and lowercase letters, respectively. Boldface letters denote column vectors. 
Calligraphic letters are reserved for sets. Prx[:c] is the probability that the discrete random variable X takes the value x. 
The shorthand [m] will be used to denote a sequence of indices {1, . . . ,m}. H(.) denotes entropy of a discrete random 
variable. hf,(x) — — x\og(x) — (1 — x) log(l — x) is the binary entropy. Dkl{Pix\\Pi'y) is the Kullback-Leibler divergence 
or relative entropy between the random variables X and Y. log, the logarithm to the base 2, is preferably used in order to 
give all rates and entropies in bits, whereas In is the natural logarithm. 
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B. Binary probabilistic code with time-sharing 

We briefly remind how the Tardos code is designed, as an example of a probabilistic code with time-sharing. The binary 
code X is composed of n sequences of m bits. The sequence Xj = (X(j, 1), • • • , X(j, rn)) T identifying user j is composed 
of m independent binary symbols, with Pi'x(j,i) [1] = Pi< Vi <G [m]. The auxiliary random variables {Pi}™= 1 are independent 
and identically distributed in the range [0,1] according to the probability density function f(p): Pi ~ f(p). Tardos pdf 

f ip) = ( 7r \/p(l — P)) * s symmetric around 1/2: /(p) = /(l — p). It means that symbols '1' and '0' play a similar 
role with probability p or 1 — p. Both the code X and the time-sharing sequence p = (pi, . . . ,p m ) T must remain as secret 
parameters. In the original paper, the pdf is slightly different as it is defined in [t, 1 — t] where t > is the cut-off parameter 
fixed to l/300c. We do not consider this cut-off since the integrals are all well defined over (0, 1). 

This definition might encompass more fingerprinting codes than the Tardos one. Although its construction is very different, 
the Boneh & Shaw code (BS) shares a similar statistical structure [3). When n users are addressed, the ratio P of symbol '1' 
in the code symbols {X(j, i)Yj=i f° r a gi ven index i £ [m] can be considered as a discrete random variable whose probability 
mass function is given by Prp[fc/n] = 1/n, Vfc S [n]. This means that the sequence identifying user j is composed of m 
binary symbols, with Prx(j,j)[l] = Pi, Vz £ [m], where pi 6 {1/n, 2/n, . . . ,(n— l)/n, 1} is chosen equiprobably. Therefore, 
the resemblance with the Tardos construction is clear: as n goes to infinity, this code can be constructed as a Tardos code 
but with a flat pdf over [0, 1]: f(p) = 1 V p e [0, 1]. 

However, the difference between the Tardos and the BS codes is that the rate of the latter is imposed by construction. Let 
us define the rate R of a fingerprinting code by R = log(n)/m. In a BS code, the rate is known to be log(n)/r(n — 1), 
where r is the so-called "replication factor" |3|. However, in order to perform a reliable accusation, the rate of any code 
must be lower than the capacity of the collusion channel [1|. Finding the capacity induced by a collusion process is a hard 
problem in general. This paper only deals with the achievable rate of Tardos-like codes (either with the Tardos pdf or a 
flat pdf to simulate a BS code), which is defined as the maximum rate guaranteeing a reliable decoding for any collusion 
process in a given class. 

C. Collusion process 

Denote the subset of colluder indices by C — {ji, ■ ■ ■ ,j c }, and Xq — {X^, . . . , X Jc } the restriction of the code to this 
subset. The collusion attack is the process of taking sequences in Xq as inputs and yielding the pirated sequence Y as an 
output. 

Traitor tracing codes have been first studied by the cryptographic community and a key-concept is the marking assumption 
introduced by Boneh and Shaw 0. It states that, in its narrow-sense version, whatever the strategy of the collusion C, we 
have Y(i) € {X(ji,i), ■ ■ ■ , X(j c , i)}. In words, colluders forge the pirated copy by assembling chunks from their personal 
copies. It implies that if, at index i, the colluders' symbols are identical, then this symbol value is decoded at the i-th chunk 
of the pirated copy. 

This is what watermarkers have understood from the pioneering cryptographic work. However, this has led to miscon- 
ceptions. Another important thing is the way cryptographers have modeled a host content: it is a binary string where some 
symbols can be changed without spoiling the regular use of the content. These locations are used to insert the code sequence 
symbols. Cryptographers assume that colluders disclose codeword symbols from their identifying sequences comparing their 
personal copies symbol by symbol. The colluders cannot spot a hidden symbol if it is identical on all copies, hence the 
marking assumption. 

In a multimedia application, for instance, the content is typically divided into chunks. A chunk can be a few second clip 
of audio or video. Symbol X(j,i) is hidden in the i-th chunk of the content with a watermarking technique. This gives 
the i-th chunk sent to the j-th user. In this paper, we only address collusion processing where the pirated copy is forged 
by picking chunks from the colluders' personal copies. We do not cope with the mixing of several chunks into one (we 
assume that the watermarking technique is robust enough to handle this mixing collusion process). The marking assumption 
still holds but for another reason: as the colluders ignore the watermarking secret key, they cannot create chunks of content 
watermarked with a symbol they do not have. However, contrary to the original cryptographic model, this also implies that 
the colluders might not know which symbol is embedded in a chunk. 

1) Mathematical model: Our mathematical model of the collusion is essentially based on four main assumptions. The first 
assumption is the memoryless nature of the collusion attack. Since the symbols of the code are independent, it seems relevant 
that the pirated sequence Y also shares this property. Therefore, the value of Y(i) only depends on {X (ji, i), ■ ■ ■ , X(j c , i)}. 
The second assumption is the stationarity of the collusion process. Except when the Tardos code is broken (this is explained 
in the next section), we assume that the collusion strategy is independent of the index i in the sequence. Therefore, we can 
describe it forgetting the index i in the sequel. The third assumption is that the colluders select the value of the symbol Y 



4 



depending on the values of their symbols, but not on their order. That is, the collusion channel is invariant to permutations 
of {X(ji, {),■■■ , X(j c , i)}. Therefore, the input of the collusion process is indeed the type of their symbols. In the binary 
case, this type is fully defined by the following sufficient statistic: the number Si of symbols '1': = Ylj=iX{j,i)- 
These three first assumptions greatly simplify the analysis of the problem without restricting the power of the colluders 
because they do not prevent them from implementing an optimal collusion attack (see sections 2 and 3 of 0~|). Hence, our 
approach does not imply any loss of generality. The fourth assumption is that the collusion process may not be deterministic, 
but random. These four assumptions yield that the collusion attack is fully described by the following parameter vector: 
9 = (6o, ■ ■ ■ ,d c ) T , with Ofj = Pry[l|E = a]. The following subsection gives examples of such collusion attacks, but we 
can already state that they all share the following property: The marking assumption enforces that Oq = and 8 C = 1. The 
authors of ifTUll also speak about 'eligible channel'. 

2) Classes of colluders: We introduce four classes of attacks with increasing power. 

a) Class-A: The weakest kind of colluders decides the value of the symbol Y(i) without considering all their symbols. 
Before receiving the personal copies, these c dishonest users have already agreed on how to forge the pirated copy. This 
strategy amounts to set an assignation sequence (Mi, • • • , M m ) with Mj S C, such that Y(i) = X(Mi, i). We assume that the 
colluders share the risk, so that the cardinality |{i|M< — j}\ ~ m/c, for all j £ C. The assignation sequence is random and 
independent of the personal copies. Hence, for each collusion size, Class-A has a single collusion attack 9 given by a = a/c, 
Vcr = 0, . . . , c. For the sake of coherence with the subsequent notation, we say that 9 e V A (c) = {c _1 (0, 1, . . . , c— 1, c) T }. 

b) Class-B: This second class of colluders differs from Class-A in the fact that the assignation sequence is now a 
function of the personal copies. These colluders are able to split their copies in chunks and to compare them sample by 
sample. Hence, for any index i, they are able to notice that, for instance, chunks Cj 1 i and Cj 2 i are different or identical. 
For binary embedded symbols, they can constitute two stacks, each containing identical chunks. This allows new collusion 
processes such as majority vote, minority vote, coin flip [9|. 

The important thing is that colluders can notice differences between chunks, but they cannot tell which chunk contains 
symbol 'O'0 Hence, symbols '1' and '0' play a symmetric role, which strongly links the conditional probabilities: Pry-[1|£ = 
cr] = Pry-[0|£ = c — a] = 1 — Pry[l|E = c — cr]. Therefore, Class-B collusion attacks are constrained in the following way: 

9 e V B {c) ± {9 : O = 0, 6 C = l,9 a G [0,1] for <r G [c - 1], a = 1 - C _ CT , for a S [c-1]}. (1) 

Hence, a Class-B collusion attack has [(c — l)/2j degrees of freedom, and for even c we necessarily have 9 c / 2 = 1/2. 
Clearly, for c = 2 the only possible Class-B collusion strategy is 9 = {0,0.5, 1}, which is also the Class-A attack. Class-B 
collusion is relevant for traitor tracing in the multimedia scenario, where each bit of the code is embedded in a different 
chunk (frame, group of frames, etc.) of the multimedia signal by means of a watermarking technique. The authors of iflOl 
refer to this class as "strongly symmetric eligible channel." 

c) Class-C: This is the classical collusion model used by cryptographers since Boneh and Shaw [3]. The bits are 
directly pasted in the host content string, and thus the colluders can compare their copies bitwise in order to disclose the 
location of the traitor tracing code. Class-C collusion attacks are no longer constrained like in Class B, and new strategies 
are then possible such as the following: 

• All Ts. The colluders put the symbol '1' whenever they can: Q a = 1, < cr < c, 

• All '0's. The colluders put the symbol '0' whenever they can: 8 a — 0, < a < c, 
In general, a Class-C collusion strategy belongs to the following set: 

9 £ V c {c) ^{9:60^0,6, = 1, 6 a € [0, 1], a € [c]}, (2) 

and therefore it has c — 1 degrees of freedom. 

d) Class-D: This last class is quite special because it no longer fulfills the stationarity assumption introduced in 
III-C1I Now, the knowledge of the time-sharing sequence p is granted to the colluders. From a statistical point of view, the 
conditional probabilities depend on Sj and Pf. Pry; Pi]. The collusion model for this class is a set of c + 1 functions 
9{p) = (6 {p),...,6 c (p)) T such that 

e(p)€V c (c),Vp€[0,l], (3) 

The interest of Class-D is twofold: on one hand, it gives the rate achievable by a code that does not perform time-sharing 
(i.e. the value of p is fixed) and on the other hand it shows the achievable rate when the code has been broken (i.e. the 
secret of the probabilistic code has been disclosed), meaning that the colluders know the value of p; for all index i € [m]. 
Therefore, they can adapt their strategy for each index chunk according to its value pt. Notice that the attack is still assumed 
to be memoryless. 

2 Note that in order this to be strictly true, we need the probability distribution of the time-sharing sequence to be symmetric, as it is the case in this 
paper. 
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D. Decoding families 

The study of traitor tracing codes from an achievable rate standpoint largely decouples their performances from any 
particular decoding algorithm. However, we consider two different families of decoders: the simple decoder Q~l Sec. 4] and 
the joint decoder |fl] Sec. 5]. The simple decoder calculates the empirical mutual information between each user codeword 
and the pirated sequence, whereas the joint decoder calculates the empirical mutual information between each possible subset 
of c users and the pirated sequence. Due to their different nature, the two families have different achievable rates. Briefly, 
the joint decoder represents what the accusation side could do in an ideal world where complexity is not a matter, and it 
has been shown to be capacity-achieving. However, it has to tackle (") groups which seems hardly affordable for large n. 
The simple decoder, suboptimal in general, represents the upper performance limit for more practical decoders. 

1) Joint decoder: The achievable rate for the joint decoder against a given collusion attack is based on the mutual 
information between Y, a symbol of the pirated sequence, and Xc, the symbols of the colluders' code sequences (U Sec. 5]. 
This holds for any index thanks to the symbol independence, and this is taken in expectation over the time-sharing random 
variable P: 

R joiat (8) = -E P [I(Y;X c \P = p,® = 8)} 



(E P [H(Y\P =p,® = 0)]- E P [H(Y\X C ,P = p,& = 0)]) 
E P [H{Y\P =p,B = 8)]- E P [H(Y\E, P = p,& = 9)}) , 



(4) 



where £ is the random variable defined as the number of ones in the set Xc - Equality in (0]i follows because of the assumption 
stated in Sect. IH-CI namely that the output of the collusion channel only depends on the type of Xc, not on the order of 
its elements. For the sake of clarity, we omit the expression © = in the sequel, but all the probability, entropy or mutual 
information expressions are given for a given collusion attack. Plugging the collusion model introduced in III-C1I we have: 



\P=p] = 



(T=0 



a Pi s [cr\P = p], 



Pl Y [l\E = a, P=p] = a , 
with Prs[cr|P = p] = ( c a )p a (l —pp c ~ a ', known as the Bernstein polynomials 

1 



(5) 



(6) 



fl3l . Therefore, Eq. (0| can be rewritten as: 



-Rjoint(0) 



E, 



^E P [Prx[v\P=p}]h b (9 a ) 



(7) 



A possible interpretation of (|4]i is that the rate can also be expressed in terms of the average discrimination (or Kullback 
Leibler distance as |14|): 



Rj oiat (0) = Ep[D kl (Pt y ^\\PiyPt^\P = p)}/c, 



(8) 



= E f 



Pr Y [y\Z = a] 



Pr Y [y\P = p] 



/c. 



]T 5>r s [a|P=p]Pr y [y|S = a]log 
ye{o,i} <y=o 

The usefulness of this expression will become patent in Section IIII-BI 

2) Simple decoder: The achievable rate for the simple decoder against a given collusion attack is given in JT] Sec. 4]: 

Psim P ie(0) = E P [I(Y;X\P=p)] 

= E P lH(Y\P = p)}-Ep[H(Y\X,P = p)} 
= E P [D KL (Pr xx \\Pi- x Pr Y \P = p)) 



(9) 



(10) 
(11) 



This links the notion of rate to the inherent capability of distinguishing two hypothesis: 

• Hq: User j is innocent, and his codeword is independent of Y given P: Pr[Y, X\Hq\ — Pr[y]Pr[X], 

• Tit: User j is guilty and Y has been created from his codeword: Pr[Y,X|7<i] = Pr[y|X]Pr[X]. 

The calculation of the rate needs the expressions of the conditional probabilities induced by the collusion model: 



Pv Y [l\X = l,P = p] 



fc=i 

c-l 



k(Xi\)p k -\i-p) c 



Pr Y [l\X = 0,P = p] = JX^Al-p) 



c-fe-l 



(12) 



(13) 



k=0 
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Proposition 1: Two simple considerations: 

• For the classes A, B and C, the following relationships hold: 

Pr Y [l\X = I, P = p] = Pry[l|P = p] + ^j^Pry[l|P = p], (14) 

Pr Y [l\X = 0,P=p] = p rY [l\p= p ]-P—Pr Y [l\P = p}. (15) 

c op 

. For c= 2, V A (2) =V B (2). 

Proof: After some manipulations, we have J^Pr y [l|P = p] = e(Pry[l|X = 1,P = p] — Pi Y [l\X — 0, P = p\). 
Moreover, Pry[l|P — p] — pPry[l|AT = 1, P = p] + (1 — p)Pry = 0, P = p]. The second item is obvious. ■ 

3) Achievable rate under Class-Z attack: For a given decoding family and size of collusion, the achievable rate of a code 
under Class-Z attack (with Z E {A, B,C, D}) is the mutual information produced by the worst collusion process in this 
class. For instance, with straightforward notation: 

-R S fmple( c ) = e ™ z n (c) ^simpleW- (16) 

Since the colluders are more and more powerful as we consider upcoming classes, the following relationships hold for the 
simple decoder (and similarly for the joint decoder): 

Simple ( c ) ^ -^simple ( c ) — ^fmple( c ) — ^simple ( c )- (17) 



To stress the importance of the time-sharing variable P, it is interesting to define the function 

r Z 
simpL 



^fmpie (c,Po) = I(Y;X\P = Pa ,® = 0*), (18) 



where 



0* = arg min R sim? i e (0). 



The strong non-convexity of r,? mple (c, p) in p, in general, justifies the need of time-sharing (TJ, as will be seen. Obviously, 



R fimpl e ( C ) = E P ^ l e ( C ,p) 



The extension of this definition to the joint decoder is straightforward. 
III. The joint decoder 



A. Colluders Class-A 

Proposition 2: A Class-A collusion leads to pirated sequence symbols whose probability is Pry[l|P = p] = p, for any 
collusion size. Hence, the achievable rates of the joint decoder against Class-A collusion is for the Tardos and the fiat 
distribution: 

Rf oml (c) = c" 1 Up [h b (P)\ - fy P [Pr s [(7|P]] h b (a/c)j . (19) 

Proof: Since we have 6 a = a/c, Vcr € {0, . . . , c}, Pry [1|P = p] has a simple expression: 

Pry[l|P = p] = c- 1 E s [a|P = p]=p. (20) 

The last equality comes from the fact that S is a random variable distributed as a binomial B(c,p), so its expectation is cp. 

■ 

With the help of Mathematica, the expectations find closed form expressions, and the achievable rate in bits is for the Tardos 
pdf: 



-^-joint( c ) c 



1^9 1 (^ _ 1 ^ r(a + l/2)r(c-a + l/2) ^ \ 
^2 -log(e)-,r ^ T(a+l)T(c-a + l) Wc) J (21) 



whereas for the fiat pdf (i.e. for the probabilistic BS code): 

RfouM = c- 1 (log(e)/2 - (c + l)- 1 J2 h b {°/c)^j (22) 

The resulting achievable rates are plotted in Fig. Q] These plots suggest that they decrease as 1/c 2 . This is confirmed by 
the next proposition. 
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Fig. 1. Achievable rates for the joint decoder against different classes of collusion, for Tardos and probabilistic BS codes. The fingerprinting capacity 
(according to Amiri and Tardos) against Class-B-C colluders is plotted for comparison. 



Proposition 3: For any pdf f(p) : [0, 1] — » R + , we have (in natural units) 

^ R ^-2^=°- (23) 

See Appendix IA-AI for the proof. 

Consider now the achievable rate as the expectation over P of the function r^ int (c,p) defined according to ( fT8l . This 
function, which is plotted in Fig. [2]for different values of c, is symmetric around p = 1/2 because hb{a/c) = hf,((c— a) / 'c). 
For c = 2 and c — 3, its maximum is in p = 1/2. This shows that the best pdf would be a Dirac's delta in p — 1/2. There 
would no longer be need for time-sharing variable P, and the code X would be composed of i.i.d. binary components with 
Pr[X,i = 1] = 1/2. For c > 3, the maximum is no longer in p = 1/2, but on two symmetric values depending on c, so 
that the capacity-achieving pdf is composed of two Dirac's deltas. This is a very special case where the capacity can be 
numerically derived. The achievable rates of the Tardos and the probabilistic BS codes are lower than the capacity, as can be 
seen in Fig. Q] The next section shows however that the Dirac's delta pdf achieving capacity in Class-A is a very dangerous 
choice under other collusion classes, and that time-sharing becomes a necessity. 



B. Colluders Classes B and C 

1) 2 colluders: Thanks to Prop. [1] the rationale for Class-A also holds for Class-B when c = 2. Tardos code rate is 
R? int {2) = 7/8-log(e)/2 « 0.154 bits, whereas the rate for probabilistic BS code (i.e. flat pdf) R? int {2) = log(e)/4-l/6 w 
0.194 bits. The capacity is achieved with the Dirac's delta pdf, f(p) = 5{p — 1/2), and it is approximately 0.25 bits. We 
found back the same result as Amiri and Tardos (see line 2 of Table 1 in [10|). However, this strategy is very risky if the 
number of colluders is actually bigger than 2 as we shall see in the next section. 

2) More colluders: When c > 3, the analysis is much more complex and we have only succeeded to find out the worst 
collusion process and thus, the achievable rate for a given pdf f(p). 
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Fig. 2. Plot of fj^; nt (c, p) for the joint decoder under Class-A attack. 



We resort now to the expression © of the achievable rate in terms of the relative entropy. The problem of minimizing 
(O can be rewritten as a double minimization, exactly like the Blahut-Arimoto algorithm for the computation of the rate- 
distortion function [14|. The main difference is that our minimization problem corresponds to a degenerate rate-distortion 
problem where the only distortion constraint is that 8 € P B ~ (c) (in the sense that 8 S V B (c) or 8 S V c '(c) depending 
on the class of colluders). The reader is referred to |[T4l or ifTTl Chapt. 13] for a detailed presentation of the Blahut-Arimoto 
algorithm as we only explain its application to our model. 

In a slight abuse of notation, let us denote the rhs of (O by i?(Pry[Y|.P], 8). The worst collusion process is disclosed by 
iteratively minimizing over each argument of this function, keeping the other constant. Thus, each iteration is comprised of 
two steps: 

1) In the first step of the fc-th iteration, for a fixed law Pry[l|P = p] = q^ k ~^{p) whose expression complies with (0, 
we minimize R^^ 1 ' (p), 8) over 8. Note that R(q l ^ k ^ 1 \p),8) is convex in 8 because for fixed po S [0,1], or 
equivalently for fixed Prs[a-|P = p ], the argument of the expectation in (9) is convex in 8 IfTTl Th. 2.7.4], and the 
expectation of this function over P is still convex. Hence, the minimization of Ric^ -1 * (p), 8) amounts to canceling 
the (c— 1) partial derivatives {9 and 9 C are already fixed to and 1, respectively). Notice that we also have to impose 
the constraint 8 £ V B ~ (c). Ignoring temporarily this constraint, we have 



_d_ 

dQ n 



Riq^ip)^) 



1. 



-E, 



Pr s [a|P = p] log 



1 - i 



log- 



1 



Jk- 



l) (p) 



q^-^ip) 



By setting the last expression to 0, we obtain 

el k) = 



with 



B (fc) (a) = T 



B( fc )(cr) 
PrsMP 



1. 



p] l0| 



1-9 



(fc-l) 



(P) 



Hp) 



l P [Pr E [a|P=p]] 



(24) 



(25) 



(26) 



Note that B^(a) is well defined because: 

• q^^ip) is a polynomial of degree c which equals only for p = (resp. q^^ijp) — 1 only for p = 1); 

• PrE[c|P = p] is also a polynomial that goes to zero for p = and p = 1. Hence, by continuity, the numerator of 
equals for p = and p = 1; 
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TABLE I 

Worst collusion attacks, joint decoder, Tardos pdf, Class-C. 



c 


0* 


Rf. ,(c) in bits 
joint v > 


2 


(0,0.5, 1) 


0.153 


3 


(0,0.340,0.660, 1) 


0.071 


4 


(0,0.260,0.5,0.741, 1) 


0.041 


5 


(0, 0.209, 0.403, 0.597, 0.791, 1) 


0.026 


6 


(0, 0.176, 0.338, 0.5, 0.662, 0.824, 1) 


0.019 


7 


(0, 0.151, 0.291, 0.431, 0.569, 0.709, 0.849, 1) 


0.014 


8 


(0, 0.133, 0.256, 0.378, 0.5, 0.622, 0.744, 0.867, 1) 


0.011 


9 


(0, 0.119, 0.229, 0.338, 0.446, 0.554, 0.662, 0.771, 0.881, 1) 


0.008 



• the denominator of d26b doesn't cancel as there exist p g]0, 1[ such that f(p) > 0. 

Finally, Eq. d25l ) is always between and 1, showing that the constraint 8 G V B ~ c {c) is actually inactive. 
2) The second step of the fc-th iteration consists in updating the function Pry [1\P = p] in order to provide the next function 
q( k ) (p) with respect to the new collusion model 0^ found in the first step. This is done by finding the function q^ (p) 
minimizing the functional R(q(p), 8^ k '). Let us denote by r(q(p)) the integrand of (O for a fixed 6^ k \ We create an 
extension of the derivative r(q(p)) in q(p) by a Taylor expansion of the difference R(q(p)+e(p), 9 {k) )-R(q(p), 9 (k) ) = 



E 



dr 



. . e(p) 
i{v) 



Ep [o(e(p))]. The minimum is reached for a function q^ (p) such that any perturbation e(p) doesn't 

change the value of the functional at least up to the first order. In other words, it cancels ^ 
following update: 



. This leads to the 

<?(p) 



q^(p) = J2€ ] ^\P=P}- (27) 

Very much like for the Blahut-Arimoto algorithm, convergence to the worst collusion channel is monotonic, i.e. every 
step decreases the objective function. Since the optimization problem is convex, convergence to the optimal 9 is assured. 

Fig.Q~|shows the resulting achievable rate R^ int (c) when this algorithm is applied to the Tardos and probabilistic BS codes. 
We observe two surprising facts: 

Proposition 4: For a symmetric f(p) (being it a continuous pdf or a discrete pmf), the absolute minimum of the rate in 
G V c {c) is achieved for a Class-B collusion attack (i.e. 6* = l- 6*_ a , Ver G {0, . . . , c}), hence R? iat (c) = i?£ int (c). 

Proof: See Appendix I A-B I ■ 
This proposition is contained in iflOl Lemma 4.1] (although its proof is not given therein) which states that the capacity 
is achieved with a symmetric f(p) and a "strongly symmetric channel", i.e. a Class-B attack. Yet, these authors were able 
to 'computationally' find out the worst collusion attack only for the capacity-achieving pdf f(p) whereas we provide here 
a powerful solving algorithm for any pdf. On the other hand, they are able to find the capacity-achieving pdf. Yet, it is a 
discrete pmf with strong dependency on the collusion size, which is not known in practice. The worst case attacks are given 
in Table [I] for small collusion sizes and the Tardos pdf. 

The second fact is illustrated in Figure [3fa) showing that the difference 6*—a/c for the Tardos pdf is very small, especially 
for large c. This means that the optimal Class-B-C strategy for the Tardos pdf is surprisingly very close to the Class-A 
attack, a fact reflected in Figure Q] where one can see that the rates under Class-A-B-C attacks are indistinguishable for the 
Tardos pdf. Interestingly, it has been mentioned in lfl2l that the Class-A attack seems to be asymptotically optimal when 
the optimal f(p) (which is asymptotically very close to the Tardos f(p)) is used. Nevertheless, this does not mean that 9* 
strictly converges to a/c, neither for the Tardos f(p) nor for other arbitrary time-sharing pdfs. As shown in Figure 0b), 
for instance, the worst Class-B attack for the probabilistic BS code diverges from Class-A as c is increased. Indeed, the 
achievable rates plotted in Figure Q] for this code under Class A and Class-B-C are different. 

In conclusion, this section lowers the importance of the colluders classification introduced in Sec. III-C2I for the Tardos 
and probabilistic BS codes. Surprisingly, there is no need to distinguish Class-B and Class-C, and experimentally, the worst 
collusion attack against the Tardos code seems to be very close to the Class-A. In the light of Prop. [3] this would mean that 
the achievable rate under Class-B of the Tardos code is asymptotically converging to 2m(2)/c 2 , just like the capacity for 
many pirates given in IflOl Sec. 4.2], which is also plotted in Fig. Q]for reference. In this regard, it is worth recalling that 
the capacity-achieving time-sharing distribution depends on the number of colluders, whereas the Tardos and flat pdf remain 
independent of c. 
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a/c 

(a) Tardos 

Fig. 3. Worst Class-B-C collusion attack against the joint decoder. 




(b) Probabilistic BS 



C. Colluders Class-D 

Remember that Class D colluders have disclosed the exact values of {pi}i£[ m ], so that their strategy is, a priori, no longer 
stationary, but on the contrary dependent on p. The colluders minimize the achievable rate of the code by finding the worst 
collusion attack 6*(pi) minimizing I(Y; S|P = pi). 

Proposition 5: The worst case collusion strategy minimizing the rate of the joint decoder is given by 9* (p) = [0,8* (p), ... ,8* (p) , 1] 
with 



p c + (l-p) c 



(28) 



Proof: See Appendix IA-CI ■ 
The worst case attack is not constant along the pirated sequence if time-sharing has been done. Note also that it depends 
on c, the number of colluders. Interestingly, as c grows, the worst case attack amounts to a simple deterministic strategy, 
which depends only on whether p is larger or smaller than 1/2, as illustrated in Fig. H|a). It summarizes as selecting the 
All '1' (resp. All '0') strategy when p > 1/2 (resp. p < 1/2) and the 'coin-flip' strategy (cf. Sect. III-C21 > when p = 1/2. 
The resulting r® int (c,p) is shown in Fig. |4|b) for different values of c as a function of p. Fig. Q] shows the achievable rate 

i?j^ int (c) = Ep r? int (c, p) for Tardos and probabilistic BS codes, where we can see that both rapidly decrease with similar 
slope. It is very interesting to notice that, although the colluders have disclosed the secret of the code, they cannot set the rate 
to 0. Nevertheless, the following proposition shows that the capacity vanishes exponentially fast as the number of colluders 
is increased. 

Proposition 6: For any value of c > 2, capacity under Class-D collusion is achieved with f(p) = S(p — 1/2), and it is 
given in bits by 

C D (c) = (29) 

Proof: See Appendix I A-D I ■ 
According to HI, d29l is also the capacity reached by a code which does not perform time-sharing. Thus, as far as a joint 
decoder is concerned, time-sharing under Class-D collusion does not bring any gain in terms of capacity. The comparison 
between this exponentially vanishing capacity (with exponent 1 — c) and the achievable rate under Class-A, which is only 
decreasing in 1/c 2 , illustrates the dramatic benefits of keeping secret the time-sharing sequence. 
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(a) (b) 

Fig. 4. Worst case Class-D attack against the joint decoder: parameter of the worst case attack (a), and rP- Ac,p) (b). 




C 



Fig. 5. Achievable rates for the simple decoder against different classes of collusion, for Tardos and probabilistic BS codes. 

IV. The simple decoder 

A. Colluders Class-A 

Proposition 7: A Class-A collusion produces the following achievable rate: 

i?l ple (c) = E P [h b (P)} - E P [Ph b (P + (1 - P)/c) + (1 - P)h b (P(l - l/c))} (30) 
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Fig. 6. Plot of j e (c,p) for the simple decoder under Class-A attack. 



Proof: Prop. □ and (|20j yield that Pr y [l|X = 1, P = p] = p + (1 - p)/c and Pr y [l|X = 0,P = p] =p - p/c. These 
expressions are then plugged in dTTY ■ 
Fig. |5] shows the achievable rate (obtained through numerical integration) against the collusion size for the Tardos and 
the probabilistic BS codes. As can be seen, the rate for the probabilistic BS code against Class-A is higher than that of 
the Tardos code. The reason is in the shape of r^ mvle (c,p), which is plotted in Fig. [6] for different values of c. This figure 
suggests that r^ mpie (c,p) achieves its global maximum at p = 1/2. According to this, capacity for the simple decoder against 
Class-A would be achieved when f(p) = 5(p — 1/2). 



B. Colluders Classes B and C 

We do not have any proof for the Classes B and C. We were only able to find the worst collusion attack thanks to a 
numerical optimization tool which performs well only if the collusion size is not too big: c < 15. This was done for the 
Tardos and probabilistic BS codes. For the Tardos code, Table iHl shows the resulting worst collusion attacks for c < 10. The 
rate achievable by Tardos and probabilistic BS codes under the worst Class B-C attacks is plotted in Fig. [5] which suggests 
that the Tardos pdf is a better choice than the fiat pdf as the number of colluders increases (note that the rate for the fiat 
pdf already becomes lower than that of the Tardos pdf for c = 8). 

The observation of the numerical results allows us to formulate the two following conjectures (without formal proof). 

Conjecture 1: For a symmetric f(p), the Class-C worst collusion attack indeed belongs to the Class-B subset, i.e. 6* — 
1 - 0*-*, Va G [c], and P s f mple (c) = R° mpie (c)- 

Conjecture 2: For the Tardos pdf f(p) — y/p(l — p)/n, the worst collusion attack surprisingly makes the probability 
Pry[l|P = p] converging to q cony (jj) = 2 arcsin( v /p)/7r, as the collusion size increases^ More specifically, Pry [1|P = p] is 
the orthogonal projection of g conv (p) over the affine subspace spanned by the Bernstein polynomials {Prs[cr|P = p]} a e[c-i] 
and containing the polynomial Prs[c|P = p\. In other words, /^(PryfljP = p] — q com (p))Pr^{a\P = p}dp = 0, Vct G [c— 1]. 

Fig.|7Ja) illustrates how the probability Pry[l|P = p] quickly converges to q com (p) (the thick solid line) as c is increased. 
Fig. [3b) shows the resulting rates r^ m le (c,p). According to the second conjecture, in practice we can obtain the parameters 
of the worst case attack for the Tardos pdf by performing the projection of q com (p) — Prs[c|P = p] onto the linear 
subspace spanned by the Bernstein polynomials. The Durrmeyer-Sevy algorithm is an elegant way to perform this orthogonal 
projection lfT3l Th. 2]. 

3 Note that <? conv (po) is nothing but the integral in p of the Tardos pdf from to po- 
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(a) (b) 

Fig. 7. Worst Class-C collusion attack against the simple decoder and Tardos time-sharing pdf: plot of Pry[l|P = p] (a), and plot of r c . , (c,p) (b). 



TABLE II 

Worst collusion attacks, simple decoder, Tardos pdf, Class-C. 



c 


e* 


, (c) in bits 
simple v ' 


2 


(0,0.5, 1) 


0.087 


3 


(0,0.652,0.348, 1) 


0.035 


4 


(0,0.488,0.5,0.512, 1) 


0.02 


5 


(0, 0.594, 0.000, 1.000, 0.406, 1) 


0.013 


6 


(0, 0.503, 0.175, 0.500, 0.825, 0.497, 1) 


0.009 


7 


(0, 0.492, 0.000, 0.899, 0.101, 1.000, 0.508, 1) 


0.007 


8 


(0, 0.471, 0.000, 0.689, 0.500, 0.310, 1.000, 0.529, 1) 


0.005 


9 


(0, 0.440, 0.000, 0.698, 0.230, 0.770, 0.302, 1.000, 0.560, 1) 


0.004 



C. Colluders Class-D 

The mutual information between Y and X knowing the value of p is as follows: 

I{Y-X\P = p) = H(Y\P=p)-H{Y\X 1 P=p) (31) 
= h b (VT Y [l\P = p])-ph b (Vi Y [l\X = l,P = p}) 

- {l-p)h b (Pr Y [l\X = 0,P=p}) (32) 

1)2 colluders: For the case c = 2, the collusion strategy has only one degree of freedom, i.e. 9 = [0, 6\, 1]. 
Proposition 8: The worst collusion strategy for c = 2 is given by 9\ — p 2 / (p 2 + (1 — p) 2 ), exactly as for the joint decoder 
(see Prop. [5). 

Proof: The steps are roughly the same as those followed in Appendix IA-CI for the joint decoder. Taking the derivative 
of (Qj]i with respect to 9 a we obtain 

^Ll(Y;X\P = p)= Pr s [ CT |P = p] log (A(8,p)) , (33) 

where 

A(0, P ) = f 1 -^^) ( *rm = l>P = P] Y' C ( *YW r 0,P = p] V- )/C . (34) 
v ,y> \ Pr Y [l\P=p\ J \1--Pty[1\X = 1,P = p\J \l - Pry[l|X = 0,P = p]J 

It only remains to search for the collusion strategy that makes A(0,p) = 1, taking into account that for c = 2, 6 = [0, 9\, 1]. 
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2) More colluders: When c > 2, obtaining a closed-form expression for the worst case attack is not possible, in general. 
However, it is possible to reduce the computation of the optimal collusion strategy to solving for a simple line search or 
linear equation. This is based on some fundamental results given in Lemma Q] and Lemma [2] below. 

Lemma 1: The worst case collusion strategy when 3 or more colluders are involved achieves null rate in the range 
p G [r]c, 1 — r) c ]> with r\ c the unique real root in the interval [1/c, 2/c] of the following polynomial: 

(l-p) c - 2 (l-cp)+p c - 1 . (35) 

Moreover, the value of rj c asymptotically approaches 1/c as c is increased. 

Proof: See Appendix IB-AI ■ 
Lemma 2: Let r\ c be the root given in Lemma [T] For p ^ [r) c , 1 — rj c ] and c > 3, there is at most one component of 9*(p) 
which is not equal to zero or one: 

• If p < 7y c , the worst collusion is of the form 8 a {p) = (0, 0i(p), 0, . . . , 0, 1) T . Furthermore, 0i(p) = 1 for p 6 [1/c, r) c ]. 

• If p > 1 — 77c, the worst collusion is of the form 9 b (p) — (0, 1, . . . , 1, c _i(p), 1) T . Furthermore, c -i(p) = for 
p G [1 - r) c , 1/c]. 

Proq/: See Appendix IB-B I ■ 
Using lemmas [T] and the optimal collusion strategy is characterized by the following proposition. 
Proposition 9: The worst Class-D collusion strategy 9*(p) for a simple decoder is given by: 

1) In the interval p € [ry c , 1 — T) c ], 9*(p) S 7i c , where 

W c ^ {0 G TMc) : T (q sl - q so ) = 0}, 

with 

q si = (Pr s [0|X = l,P=p],...,Pr s [c|X = l,P = p]) T (36) 
q so = (Pr E [0|X = 0,P = p],...,Pr s [c|X = 0,P = p]) T . (37) 

2) For p ^ [rj c , 1 — r/ c ], #*(p) is given by Lemma [2] with 0i(p) = 1 — 6*0-1(1 — p) = which is defined as 

0* = argmin (ft 6 (gi (0,c,p)) - ph b (g 2 {0,c,p)) - (1 - p)h b (g 3 (6,c,p))) , (38) 

where 

<7i(0,c,p) = 0cp(l-p) c - 1 +p c , 
52 (0,c,p) = 0(l-p) c - 1 +p c " 1 , 
53 (0,c,p) = 0(c-l)p(l-p) c - 2 . 

Proof: Proving the first part of the proposition is straightforward: if p belongs to the interval defined in Lemma [T] then 
it necessarily implies that the global minimum of the mutual information functional is achieved by a vector 9 G V B {c). In 
such case, according to the proof of Lemma Q] the optimal collusion strategy must fulfill the condition dSOb . 

For proving the second part of the proposition we resort to Lemma [2] which states that for p ^ [r) c , 1 — i] c ] the optimal 
strategy has only one degree of freedom. We have to consider two cases: 

1) If p < 1/2: We have Pr Y [l\P =p,& = 9 a {p)] = g 1 {e 1 {p),c,p), Pr y [l|P = p, = 9 a {p),X = 1] = g 2 {9 1 (p) 1 c,p), 
and Pry [1|P = p, = 9 a (p), X = 0] = .93(01 (p), c,p). Hence, the parameter 0i(p) of the optimal collusion strategy 
is the result of <|38t , 

2) Ifp > 1/2: Pr r [l|P = p, = 9 b (p)} = 1 - gi (l - c _i(p), c, 1 - p), Pr y [l|P = p, = 9 b (p),X = 1] = 
l- 93 (l-0 c _ 1 (p),c,l-p), andPr y [l|P = p,0 = b (p),X = 0] = l-g 2 (l-0 c _i(p), c, 1 -p). Taking into account 
the symmetry of the binary entropy function h b {.),itis easy to see that for the optimum strategy c _i(p) = 1— 0i(l— p). 

■ 

Let us makes some comments about the optimal Class-D attack. 

This attack may seem somewhat counterintuitive. The simplest strategy when the colluders know p would be to generate 
a new sequence independent from the embedded sequences: CT (p) = p. However, when all the colluders' symbols are the 
same, they cannot generate the desired output. This is why this simple strategy is indeed not the worst. 

Fig.Oa) shows the value of the optimal parameter 0i(p) for p ^ [r? c , 1 — r/ c ]. A corollary of Prop. [9] is that le (c, p) = 
r s Y mple (c, 1 — p). Fig. Ob) shows r^ mpje (c,p) for p 6 [0,0.5] and a different number of colluders, and Fig. \5\ shows the 
achievable rate for the Tardos and fiat pdf compared to the rates achievable under the other classes of attacks. Surprisingly, 
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^simple ( c ) i s not nu ^' although its decrease seems to be exponentially fast as for the joint decoder. For every c, the capacity- 
achieving pdf is a symmetric two Dirac's deltas distribution in the values of p maximizing ^^ mp ie( c iP)- 

In the interval p E [r) c , 1 — rj c ], the optimal collusion strategy is given by any vector 6 in the intersection between a 
hyperplane and the feasible set V D (c). Hence, the solution is not unique. Yet the problem is convex, all the solutions 
cancel the achievable rate. Notice that the minimum collusion size for nullifying the achievable rate is c = 3. As proved 
in Appendix IB-AI for c = 3 this can be achieved only for the singleton p — 1/2, and the resulting worst collusion is the 
minority collusion strategy. 

These results show the need for time-sharing if we want to be protected against malicious attack based on Class-D 
collusion strategies. For instance, a codebook with a fixed value p = 1/2 is a bad idea since colluders can always nullify 
the rate as long as they are at least 3. 

V. Conclusion 

In this paper we have carried out a performance assessment of probabilistic traitor tracing codes from an information- 
theoretic point of view. From our investigation, considering four different classes of attackers with increasing power and two 
different classes of decoders, several important conclusions can be drawn. 

Let us first list the bad news. Not knowing the embedded symbols (e.g. the Class-B, a.k.a. symmetric channel IfTUl , or 
multimedia scenario [9|) does not make the colluders less powerful (see Prop. |4]for the joint decoder, and Conj. [T]for the 
simple decoder). The case of the joint decoder is even more hopeless: the simplest collusion attack, Class-A, is asymptotically 
optimal. A mixed result is the following: disclosing the secret time-sharing sequence opens the door to a powerful collusion 
attack but, surprisingly, it does not render the code completely useless since the achievable rate is indeed strictly positive. 

The goods news are seldom: the time-sharing sequence plays a key role in the performance of the probabilistic traitor 
tracing code, offering a polynomial decrease of the achievable rate instead of an exponential decay. The achievable rate of 
the simple decoder is not so smaller than the one of the joint decoder. This is good news because the complexity of the 
simple decoder is in 0(n) whereas the one of the joint decoder is almost in 0(n c ), and in some scenarios, n can be very 
large. On the other hand, we have focused in the study of two particular codes, but as we have seen, their performance is 
really close to that of the optimal code (asymptotically the same), obtained in IfTUl for the joint decoder. Furthermore, the 
codes studied here make use of a fixed time-sharing distribution, whereas for the capacity-achieving codes it is strongly 
dependent on the number of colluders. 

The problem of finding the optimal time-sharing distribution for the simple decoder still remains open. However, the 
results of this paper suggest that no big improvement will be brought about over the existing Tardos pdf, especially if a large 
number of colluders is concerned. Our future works will investigate the trade-off between the complexity and the efficiency 
of the decoder proposing new traitor tracing decoding algorithms. 
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Appendix A 

Proofs of the propositions about the joint decoder 

A. Proof of Prop. \3\ 

The expression ( [T9T > of the rate R^ int (c) can be rewritten in terms of a double expectation: 

^mt(c) = c-'Ep [E Sc [h b (P) - h b (S c )\P = p]] , 

where S c is a random variable distributed as a binomial B(c,p) but divided by c. Thus, its expectation equals p and its 
variance p{\ — £>)c _1 . For a given p E (0, 1), we have: 

h b (S c ) = h b ( P ) + (S c - P )h' b (p) + (S c - p) 2 K(p)/2 + o Sc ((S c - p) 2 ), 

where o$ c (4>(S C )) means that, statistically, the term <fi(S c ) is getting smaller and smaller in the sense that Ve > 0, Prs c [^(S'c) | > 
e] c ^^°° o. Taking the expectation conditioned on P = p, and the natural logarithm in h b (-), we have: 



E Sc [h b {P) - h b {S c )\P=p] = -E Sc [S c - P ] h' b (p) - E Sc [(S c -p) 2 } K( P )/2 - o(E Sc [(S c - P ) 2 ]), 

n) -K( P ) + o(p(i - py- 1 ) 09) 



2c 

' +p(l-p)o(c- 1 ). (40) 



21n(2)c 



1 



Therefore, we can write (in natural units) that R^, int (c) = + Ep [p(l — p)] o(c 2 ), and Prop. [3] follows 



jomtv^ 21n(2)c 2 

B. Proof of Prop. \4\ 

The proof uses the following two lemmas: 

Lemma 3: Class-B collusion attacks have the following property: 

Pr Y [l\P=p\ = l-Pry[l|P= l-p). (41) 

This is easily proven with the change of variables p' = 1 — p and a' = c — a in (0). 

Lemma 4: If f(p) is symmetric, i.e. f(p) = /(l - p), Mp e (0, 1), and q( k ~^(p) = 1 — g( fc_1 )(l - p), Vp e [0, 1], then 
£( fe V) - 1/BW(c-«t). 

Again, the change of variables p' = 1 — p and a' = c— a shows that: 



C P [Pr s [a|P = p]] = E P [P^[c-a\P = p}} (42) 



1 _ a (fe-i) ( p ) 

Pr s HP=p]log 



g(fc-l)(p) 



1 - Q^-Vfp) 

Pr s [c-a|P=p]log ' 1 



g(*-l)(p) 



(43) 



Thus, B^(a) = 1/P<» (c-<t). 

These two lemmas show that the Class B is closed for the iteration defined in the proposed algorithm, for any symmetric 
pdf f(p). In other words, if 9 {k] G V B (c), then so is 6 { ' . Since the Blahut-Arimoto algorithm converges to the minimum 
achievable rate whatever the initial vector 0^°\ and in particular, for #' ' e V B (c), we can conclude that Class-B colluders 
can lead the worst case collusion. 

C. Proof of Prop. [5] 

We compute the gradient of the mutual information with respect to the parameters of the collusion model 9 a , a € [c— 1]. 
For the first term in the rhs of ©: 

d , , , , , n ( I -Vt y \\\P = p\\ 

—H(Y\P = p) = Pr s [.|P = p] log ( PlY ;\ P L p] Pl ) • (44) 
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For the conditional entropy: 



-£ r H(Y\X,P = p) = J2^-(H(Y\7i = a)V^[a\P = p]) 



= Pr s [a|P = p]log(^^) (45) 



By combining (l44l and d45l l. we obtain the expression 



ae„ • • " 1 " °\(l-6„)?rrll]P = p] 

Hence, in order to cancel the gradient we need to fulfill Pry[l|P = p] = 6 a = 6* , Ver S [c — 1]. This condition can be 
written as 

c-1 

6* = Pr Y [l\P=p,® = O} = 0*J2 Pl s[<T\P=p\+Prj;[c\P=p\ 

(7=1 

= 6*(l~(l- P y~p c )+p c . (46) 
Working out this last expression, the Class-D worst case collusion results in the one stated in Prop. [5] 

D. Proof of Prop.® 

According to Section IH-D3I the rate can be written as R^ oint {c) — Ep \r D (c,P)\. Our objective here is to show that 
Rf oint {c) is maximized for f(p) = 6(p— 1/2). We first insert (|28T > in (O to obtain, after simplifications, 

r D (c,p) = - c (p c log(^^ + l\ + ( i i- p y] og f^— + ij\ , for pe [0,1]. 

This function is not negative and symmetric: r D (c,p) = r D (c, 1 — p). Its derivative in p can readily be shown to be given, 
after pertinent simplications, by the following expression: 



r u '(c,p) = (p c - 1 + (l-pf- 1 ) — ^log(l-0») -_^log(e*(p)) (47) 

V 1 -p p J 

This function clearly cancels in p S {0, 1/2, 1}. We only focus on the interval p S (0, 1/2) to show that it never cancels 
again. Then (1 — p)^ 1 < 2 and — p^ 1 < —2. Since log(l — 8*{p)) and log(6**(p)) have negative values, we have: 

r D \c,p) > 2(p c - 1 + (1-pY- 1 ) ((1 - 0*(p))log(l - 9*(p)) - 0*(p)log(0») (48) 

Knowing that < 0*{p) < 1/2 on the interval p S (0, 1/2) and that (1 — x) log(l — x) — xlog(x) is positive for < x < 1/2, 
it appears that the derivative is strictly positive over p S (0, 1/2). This proves that r D (c,p) is strictly increasing on this 
interval and reaches a unique maximum in p = 1/2. 

Appendix B 

Proofs of the propositions about the simple decoder 

A. Proof of Lemma Q] 

We first redefine ( fT2l and ( fl3] l as: 

Pr Y [l\X = l,P = p] = T q sl , 

Pv Y [l\X = 0,P = p] = 6 T q m , (49) 

with q si and q so defined in ( |36T > and ([37). The cr-th component of q si and q so is also given by Prs[cr|P = p]a/{cp) and 
Prs[cr|P = p](c— a)/(c(l — p)), respectively. A necessary and sufficient condition for achieving I(Y;X\P = p) = is that 
Pry[2/|X = 1, P = p] = Piy [y\X = 0, P = p]. Taking into account the identities above, this can be expressed as 

J(0)^0 T (q sl -q so )=O. (50) 
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TABLE III 
Values of r/ c . 



c 


3 


4 


5 


6 


10 


15 


20 


Vc - 1/c 


1.7* lcr 1 


7.8 * 1CT' 5 


6.3 * 1(T 4 


4.5 * 


2.3* icr iU 


< e 


< e 



Hence, we must find at least one vector 9 E V D (c) orthogonal to (q si — q so ), with V D (c) defined in Q. Taking into 
account the linearity of the scalar product, and that 9 = 0, 9 C = 1 by the marking assumption, J(9,p) can be written as a 
convex conical combination of scalar products: 

J(0, P ) = Pc (p) + ]T 0i ■ Pi (p), e t e [o, l] (51) 

i=l,...,c-l 

where 

Pi(p)=ei + i(<lxi-<lxo) = (s)P i - 1 (l-p) c - i - 1 (i/c-p),Vi&[c], (52) 

with ei the ith canonical vector. 

Note that, on the interval [0, 1/c], only pa(p) has negative values, but this term is excluded from the sum since 9o = 0. 
Hence, (T50b can't be satisfied on this interval. In the same way, p\(p) is the only term producing negative values over the 
interval [1/c, 2/c]. Therefore, we have the lower bound: 

J(6,p) > J(O low ,p)=p 1 (p)+p c (p), pe [1/c, 2/c], (53) 

with e low = (o,i,o,..., o,i) T . 

For c = 3, J(0iow>p) = (2p — l) 2 > 0. Therefore, it is not possible to find any vector 9 € T ' (c) orthogonal to 
(q si — q S0 X except if p — 1/2 and then 9i ow = (0, 1, 0, 1) T (i.e. a minority vote) cancels the mutual information. 

For c > 3, J(9[ ow ,p) = (1 — p) c ~ 2 (l — cp) + p c ~ x is positive for p = 1/c and negative for p — 2/c. Therefore, there 
exists some r\ c E [1/c, 2/c] such that, for p > r\ c , J(9i ow ,p) is negative. The vector 9 = (0, ...,0,1) gives J(9,p) = 
p c (p) > OVp E [0, 1]. Therefore, by continuity, there exists at least one vector 9 satisfying (f50b and thus canceling the 
mutual information. Conversely, for p < r) c , (f50b cannot be satisfied. Moreover, J(9i ow ,p) can be shown to be negative in 
the whole interval [q c , 1/2], for which p c {p) is strictly positive. Hence, d50i > can be satisfied in this whole interval. 

As c increases, r\ c asymptotically approaches 1/c (see Tab. [TITb . Intuitively, this is explained as follows: the behavior of 
J {Slow, p) over [1/c, 2/c] is dominated by the term pi(p) which is strictly decreasing on this interval and equaling zero in 
p = 1/c. This justifies why lim^oo r\ c — 1/c = 0. To be more rigorous, let us first denote u = cp with u E [1,2]. In the 
interval p E [1/c, 2/c], the polynomial J(9i ow ,p) in (l53l can be expressed as J(9i ow ,u) = (1 — u/c) c ~ 2 (l — u) + (u/c) c_1 . 
For u E [1,2], J(9i ow ,u) < (1 - 2/c) c ~ 2 (l - u) + (2/c)^ 1 which cancels for u c = 1 + ((2/c) c - 1 )/(l - 2/c) c ~ 2 , 
and lirric^oo u c = 1, Since 1/c < r\ c < u c /c, then lim^oo r\ c — 1/c = 0. From the expresion of u c , we can write 
7] c = 1/c + 0(1/ c c ) = 1/c + o(l/c) since c> 2. 

The same rationale holds on the interval [1 — 2/c, 1 — 1/c], where all the scalar products have negative values except 
p c -x(p), hence a lower bound for dBTT l is: 

c-2 

J(9, p )>Y / Mp) + Pc(p)- 

i=l 

We can simplify the lower bound into: p c ~ 2 (l — c(l — p)) + (1 — p) c_1 , which is the symmetric version of the first bound. 
Hence, for p > 1 — r/ c , it is not possible to cancel the mutual information. 

B. Proof of Lemma [2] 

For the sake of simplicity, we replace the notation P = p by p and X = x by x in the sequel. This appendix concerns 
the worst case for values of p outside the interval [rj c , 1 — rj c ], i.e. Pry[l[p] ^ Pry[l|0,p] ^ Pry[l|l,p] necessarily. Denote 
by 'VI(Y; X\p)(a) the derivative with respect to the parameter of the collusion model a : 

VI(Y;X\p)(a) = Pv^[a\p}h' b (P rY [l\p}) - pP^[a\l,p}h' b (Pv Y [l\l,p}) 

- (l-p)Pr E [<7|0,p]/i' 6 (Pry[l|0,p]) (54) 

with h' b (x) = logi^, the derivative of the binary entropy which is strictly decreasing. This simplifies in 

VI(Y;X\p)(a) = Pi s [a\p] (ti b (Pr Y [l\p\) - ^h' b (Pi Y [l\l,p\) - ^^(Pr r [l|0,p])) (55) 
= Prs^W^rfec)^ - K 2 (p,c)) (56) 



19 



with 

Idfac) = c- 1 (^(Pr y [l|0,p])-^(Pr 1 r[l|l > p])) (57) 

K ^ c) = C W^WMTW^WM (58) 

For the parameters of the collusion attack 9* (except 9a = and 9 C = 1), there are three possibilities : 

. if 9* e]0,l[ then VJ(F; X|p)(<r) = 0, 
. if 0* = 0, then VJ(F; X|p)(<j) > 0, 
. if 0* = 1, then VJ(F; Z|j?)(cr) < 0. 

From now on, we detail the case of p G [0,?y c ), but the case of the interval [1 — r) c , 1] can be deduced by symmetry. 
Appendix IB-AI shows that J(9,p), which was defined in (l50l l. is positive for p E [0, rj c ). This implies that Pry[l|0,p] < 
Pry[l|p] < Pry[l|l,p]. Since h' b (x) is strictly decreasing, < Ki(p,c) and < K 2 (p,c) < c. Therefore, if 9* — 1 (resp. 
0) then a < K 2 (p, c) (resp. K 2 (p, c) < a), and if 9* K ^ p , G [0, 1] then ivT 2 (p, c) € {1, 2, c - 1}. In the sequel we look 
for closer bounds on K 2 (p, c). 

Bound #1: 1 < K 2 (p, c) < c. This amounts to prove that (0, . . . , 0, 1) and (0, 1, ... , 1) do not minimize I(Y;X\p). The 
first choice raises a contradiction: Pry[l|0,p] = implies that VI{Y\X\p)(a) < and necessarily 9* = 1. The second 
choice also leads to a contradiction: Pry[l|l,p] = 1 implies that VJ(F; X\p)(a) > and necessarily 9* = 0. 

Bound #2: 1 < K 2 (p,c) < 2. Let us define A(p) = 6 T V I(Y; X\p). According to (H), it follows that: 

A(p) =g(Vr Y [l\p])-pg(Vi Y [l\l,p})-(l-p)g(Vi Y [l\0,p}), (59) 

with g(x) = xh' b (x). As g(x) is strictly concave, A(p) > for any p G (0, rj c ). With the help of d56l >. A(p) can also be 
written as: 

A(p)=K 1 (p,c) lc-p c + &Pts[tr\p]-K2(p,c) lp c + ^ Pr s [a|p] . (60) 

\ 0<a<K 2 (p,c) \ 0<a<K 2 {p,c) J J 

Since A(p) > 0, then K 2 (p,c) < B(K 2 (p,c),p,c) with 

B(K,p,c)= — ^ (61) 

In the following we will make use of the next lemma, which is proved at the end of the appendix. 

Lemma 5: For 1 < K, B(K,p,c) < B(K + l,p, c). 

Therefore, K 2 (p,c) < B(c — l,p,c) = cp/(l — (1 — p) c ). This last function is increasing with p: K 2 (p,c) < B(c — 
1, 3/2c, c) = 3/2(1 — (1 — 3/2c) c ), Vp £ (0, 77 c ] since 77 c is never bigger than 3/2c. This function is increasing with c and 
converges to 3/2(1 — e~ 3 / 2 ) » 1.93. Thus, combining this result with the Bound #1, we have 1 < K 2 (p, c) < 2 for p < r\ c , 
and 6* has the form (0, 9i(p), 0, . . . , 0, 1) T on the interval (0, Tj c ], as expressed in the statement of Lemma|2] The remaining 
of the proof deals with a refinement of this result in the interval p € [c^ 1 , rj c (c)]. 

Bound #3: K 2 (p,c) > cp when p G [c" 1 , rj c (c)]. At most, when 9\ = 1, Pry[l|l,p] = (1 — p) ^ 1 + p ^ 1 which is a 
decreasing function over [0, 1/2]. Taken in p = 1/c, we have decreasing values with c which are all lower than 1/2 when 
c > 4. Therefore, Vc > 4, Vp £ [c _1 ,7? c ], Pry[l|l,p], but also Pry[l[p] and Pry[l|0,p], lies in the interval [0, 1/2] where 
the function h' b (x) is strictly convex: 

h' b (Yr Y [l\p])<ph' b (Vi Y [l\l,p}) + (l-p)h' b (Fr Y [l\0,p]). (62) 

Using ( |58l > and j62l , it results that K 2 (p, c) > cp. Hence, we can conclude that 6* = (0, 1, 0, ... , 0, 1) T when p G [c^ 1 , 7? c ] 
and c > 4. 

We now address the case of c = 3 and we verify that 0* = (0, 1,0, 1) T when p G [1/3,1/2] (recall from Ap- 
pendix HA] that 773 = 1/2). With this choice of 0*, Pry[l|l,p] = 1 - Pry[l|0,p], which yields that K 2 (p,3) > 1 if 
h'b(? T Y [Mp})/h' b (Pi Y [l\0,p]) < 1/3. Since both derivatives equal in p = 773 = 1/2, we apply l'Hopital's rule twice to 
obtain: 

^(Pry[l|p]) dPv Y [l\p]/dp d 2 PT Y [l\p]/d 2 p 

P T/2^(Pry[l|0,p]) P T/2 dPi Y [l\0,p]/dp p™/2d 2 Pr Y [l\0,p}/d 2 p U ' 

This shows that K 2 (p, 3) > 1 in an interval [a, 1/2]. Remarkably, it appears that a = 1/3. 
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Proof of Lemma\5\ For fixed (p,c), if K > c, B(K,p,c) is constant. Otherwise, B(K + l,p,c) — B(K,p,c) has the same 
sign as A = K + 1 — (cp c + J2 a <K (T ^ I 's[c\p]) / (p c + J2 a <K f >r s[ cr |p])- The successive derivations hold: 

a = TV- , -, £ Ea<K^[q|p] 

> l-c P — > 1 - c — ^ - IT >0 (64) 

The last inequality holds provided that p < g(c) = 1/(1 + (1 — l/c) 1 /^ -1 ). <?(.) is a decreasing function and rim c _ 1 . 0o g(c) = 
1/2. Thus, Vc > 3, we have p < r] c < g(c), which proves the lemma. 
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